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Abstract 

Orthogonal coding schemes, known to asymptotically achieve the capacity per unit cost (CPUC) for 
single-user ergodic memoryless channels with a zero-cost input symbol, are investigated for single-user 
compound memoryless channels, which exhibit uncertainties in their input-output statistical relationships. 
A minimax formulation is adopted to attain robustness. First, a class of achievable rates per unit cost 
(ARPUC) is derived, and its utility is demonstrated through several representative case studies. Second, 
when the uncertainty set of channel transition statistics satisfies a convexity property, optimization is 
performed over the class of ARPUC through utilizing results of minimax robustness. The resulting CPUC 
lower bound indicates the ultimate performance of the orthogonal coding scheme, and coincides with the 
CPUC under certain restrictive conditions. Finally, still under the convexity property, it is shown that the 
CPUC can generally be achieved, through utilizing a so-called mixed strategy in which an orthogonal 
code contains an appropriate composition of different nonzero-cost input symbols. 
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Abbreviations, Notations and Symbols 

ARPUC : Achievable rate per unit cost, denoted R. 
CPUC : Capacity per unit cost, denoted C. 

KL distance : Kullback-Leibler distance, denoted D(P\\Q) for two distributions P and Q 

A := B: Expression A is denned by expression B 

• T : Transpose of vector or matrix 

■T; Conjugate transpose of vector of matrix 

Ep[-]: Expectation with respect to distribution P 

tr[-]: Trace of matrix 

det(-): Determinant of matrix 

M(-, •): Gaussian distribution 

CM (■,■): Circularly symmetric complex Gaussian distribution 

Random variables: X, Y . . . 

Random vectors/matrices: X, Y. . . 

Realizations of random variables: x,y . . . 

Realizations of random vectors/matrices: x,y . . . 

Sets, or alphabets of random variables: X, y, S . . . 

All logarithms are to the natural base e, and information units are measured in nats. 

I. Introduction 

The channel capacity quantifies the maximum amount of information transferable over a channel, mea- 
sured on the basis of unit channel use (for discrete-time channels) or unit time duration (for continuous- 
time channels). Similarly, the channel capacity per unit cost (CPUC), initially investigated in a systematic 
way in [1]^ quantifies the maximum amount of information transferable over a channel, measured on 
the basis of unit average input cost. The relevance of the CPUC to communication systems stems from 
the fact that it directly quantifies the minimum cost necessary for reliably transmitting/receiving a unit 
of information, e.g., the minimum energy per bit [3]. 

In [1], the CPUC of ergodic memoryless channels is established. Beyond a general single-letter 
characterization of the CPUC, it is further shown that when a zero-cost (a.k.a. "free") symbol is present 

'The concept of CPUC was also explicitly proposed earlier in [2, Chap. 2, Sec. 1, Ex. 26]. 
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in the input alphabet, the CPUC yields a particularly simple form involving the maximization of the 
Kullback-Leibler (KL) distance between two conditional output distributions. 

In this paper, we consider the CPUC problem for channels with uncertain transition statistics. In 
particular, we adopt a single-user compound memoryless channel model. The compound channel model, in 
which the actual realization of the channel transition distributions is arbitrarily chosen from a parametrized 
uncertainty set and is not revealed to either the transmitter or the receiver, was initially introduced in [4] 
[5]. The channel capacity was established therein^ which takes a minimax form and can be interpreted 
as the outcome of a game between the input distribution and the channel realization. 

By generalizing the proof of the CPUC formula for ergodic memoryless channels in a relatively 
straightforward manner, we obtain the CPUC general formula for compound memoryless channels with 
discrete alphabets. We subsequently turn to the important case where a unique zero-cost symbol is 
present in the input alphabet, and in particular investigate the achievable rates per unit cost (ARPUC) of 
an orthogonal coding scheme. While it is possible to use a sufficiently long block of training symbols 
to facilitate the receiver to achieve an essentially perfect knowledge of the channel transition statistics, 
throughout the paper, we make the assumption that there is no such "training" phase, and thus the receiver 
does not expend any effort in identifying the channel. From a practical perspective, it is often convenient 
to build communication systems that operate assuming some nominal channel model, or even simply 
ignore the actual channel statistics at all, due to the fact that precise identification of the underlying 
channel may demand excess resources and may be infeasible in certain situations. The orthogonal coding 
scheme also differs from other (possibly more powerful) coding schemes, like those using the maximum 
empirical mutual information (MMI) decoder [2], which not only achieves the capacity of compound 
memoryless channels, but also achieves their maximum random-coding error exponents. However, such 
"universal" coding schemes may be challenging to implement due to excessive decoder complexity for 
current communication systems (see, e.g., [7] and references therein for a comprehensive overview of 
information-theoretic aspects of channels with various types of uncertainty). 

Our investigation of the behavior of orthogonal codes is also motivated by the following considera- 
tions. First, since orthogonal codes asymptotically achieve the CPUC for ergodic memoryless channels 
with a zero-cost input symbol [1], it is natural to ask whether a similar or disparate result holds for 
compound channels. Second, since orthogonal codes coupled with energy detection are CPUC-achieving 
simultaneously for a wide class of wideband fading Gaussian channels [8] (see also [9]), it is desirable 

2 The strong converse of the capacity theorem was established in [6]. Herein we will not elaborate upon the strong converse. 
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to examine the validity of such a property for more general channels. 

Part of the work of [10] is loosely related to our work. Besides other results, [10] investigates the 
ARPUC of mismatched decoding with additive decoding metrics [11]. Specifically, the coding scheme 
considered therein is non-orthogonal, and the decoding algorithm depends on the sum of channel metrics 
that are independent of symbol positions in codewords. Thus it is unclear how to compare the ARPUC 
in [10] and that obtained in our work - this is an interesting topic for future investigation. 

We briefly summarize the content of the paper. First, we introduce the CPUC problem for compound 
memoryless channels, and for discrete alphabets present a general CPUC formula, which is a slight 
generalization of the corresponding result for ergodic memoryless channels. Subsequently, the focus is 
turned to orthogonal codes for channels with a zero-cost input symbol. By modifying the decoding 
algorithm from fixed-thresholding (from Stein's lemma) [1] to maximum-seeking, we derive an ARPUC 
formula, which depends upon the choice of processing function of channel outputs, and upon the choice 
of nonzero-cost input symbol. 

We illustrate the utility of the derived ARPUC formula through several representative case studies. 
The aim is to show that the derived ARPUC formula provides a unified approach to evaluating and 
optimizing performance of cost-efficient communication systems, with different channel models and 
different receiver structures. Some additional insights regarding the behavior of these exemplar systems 
are also obtained by this exercise. We show that under certain conditions, simple receivers indeed yield 
rather good performance for corresponding compound memoryless channels. Specifically, we have the 
following observations, (a) Linear receivers achieve the CPUC for certain vector Gaussian additive-noise 
channels with partially unknown noise covariance, and guarantee the Gaussian-channel performance for 
scalar non-Gaussian noise channels, (b) Quadratic receivers achieve the CPUC for certain multipath 
channels with partially unknown multipath profile, and guarantee a Rayleigh-fading performance for non- 
coherent fading Gaussian channels with only the fading covariance matrix known, (c) Photon-counting 
receivers achieve the CPUC for certain Poisson channels with uncertainty in the background photon flow 
rate. Meanwhile, besides the preceding positive results, we also examine the lack of robustness of those 
receivers under certain other circumstances. 

In order to systematically investigate the ultimate performance of orthogonal codes, we derive the 
maximum ARPUC by optimizing the processing function of channel outputs as well as the nonzero-cost 
input symbol. The maximization is analytically tractable when the uncertainty set of channel transition 
statistics satisfies a convexity property, with the aid of results from minimax robustness [12]. Similar to the 
CPUC for ergodic memoryless channels, the maximized ARPUC also involves the KL distance between 
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two conditional output distributions. Furthermore, here the optimization requires solving a minimax game, 
which may be interpreted as one where the communication party initially selects a nonzero-cost input 
symbol in the orthogonal code, and then nature responds by selecting the least favorable channel state. 
The maximized ARPUC thus coincides with the CPUC if the aforementioned game has an equilibrium, 
but may also be strictly lower than the CPUC. Furthermore, analogous to the fact that mixed strategies 
enforce the existence of Nash equilibriums in finite games [13], in our problem, by adopting a mixed 
strategy in which an orthogonal code contains an appropriate composition of different nonzero-cost input 
symbols, we can further improve the ARPUC and indeed achieve the CPUC under the convexity condition. 

The remainder of the paper is organized as follows. In Section [II] we introduce the CPUC problem, and 
present a general CPUC formula. In Section [III] we describe the orthogonal coding scheme, and derive 
an ARPUC formula. In Section [IV] we illustrate the application of the derived ARPUC formula through 
several case studies. In Section [V] we perform optimization for the derived ARPUC formula, and obtain 
the corresponding CPUC lower bound and the CPUC that are achievable by orthogonal codes without 
and with the mixed strategy, respectively. In Section [VI] we conclude the paper. 

II. Channel Model and General CPUC Formula 

We consider a single-user discrete-time compound memoryless channel with input X and output Y, 
with alphabets X and y, respectively. The channel has a parameter called the channel state s G S. As 
the channel state realization is fixed as s throughout the entire coding block, the channel is memoryless 
and its input-output relationship is characterized by the conditional probability distribution P s (y\x) for 
input x and output y, parametrized by s. As will be explicitly specified in the paper, the alphabets X 
and y may be certain discrete and finite sets, or correspond to continuous sets such as the real line (then 
P s (y|x) should be understood as a probability density function). Every input symbol x is associated 
with a cost function c(x) > 0. We adopt an additive cost structure such that for a block of channel 
inputs (xi,X2, . . . , x n ), the associated total cost is Y17=i c(xj). Summarizing the preceding description, 
we specify the compound memoryless channel model as M c = (X,y,S,P.(-\-),c(-)). 

We note that the above channel model also encompasses a class of block interference channels [14], 
in which there are two states: - a compound channel state that remains constant for all blocks, and a 
blockwise channel state that independently changes from block to block. This is because, conditioned 
upon a given compound channel state, the blockwise channel state may be absorbed in the channel 
transition distribution by treating each block as a supersymbol. For an example of such type of channels, 
consider a block fading channel [15] with a partially unknown fading distribution, which corresponds to 
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the compound channel state, and statistically governs the actual channel fading realizations corresponding 
to the blockwise channel state. 

When the channel state set consists of a single element, S = {sq}, the compound memory less channel 
reduces to an ergodic memoryless channel, and we write the channel model as Mo = (X, y, P(-\-), c(-)) 
where the subscript of P(-\-), so> is dropped. 

We introduce the following definitions by adapting that for ergodic memoryless channels in [1]. For a 
given channel M c (including Mo), an (n, M, is, e) code is one in which the block length is equal to n; the 
number of messages is equal to M; corresponding to each message, the codeword (xj[ , . . . , x^), m = 
1, . . . , M, satisfies the constraint Y17=i c ( x i"^) — v '> an( ^ tne average (over the ensemble of equiprobable 
messages) probability of decoding error is no greater than e. 

Definition 1: ([4]) Given < e < 1 and [3 > 0, a nonnegative number R c is an e-achievable rate 
with cost per symbol not exceeding (3 if for every 7 > there exists no such that if n > no, then an 
(n, M, n(3, e) code can be found with log M > n(R c — 7), for every possible channel state s E S. If R c 
is e-achievable for all < e < 1, it is an achievable rate; and the capacity under average cost 0, C c (j3), 
is the maximum achievable rate. 

Definition 2: Given < e < 1, a nonnegative number R c is an e-achievable rate per unit cost if for 
every 7 > 0, there exists vq > such that if v > uq, then an (n, M, v, e) code can be found with 
logM > f (Rc — 7), for every possible channel state s 6 S. If R c is e-achievable per unit cost for all 
< e < 1, it is an achievable rate per unit cost (ARPUC); and the capacity per unit cost (CPUC) C c is 
the maximum ARPUC. 

For ergodic memoryless channels, the following proposition establishes a general formula of the CPUC. 

Proposition 1: ([1, Theorem 2]) For an ergodic memoryless channel Mo = (X,y,P(-\-),c(-)), the 
CPUC is 

C = sup (1) 

E[c(X)]</3 

In parallel to Proposition [T] the following proposition establishes a general formula of the CPUC of 
compound memoryless channels with discrete and finite alphabets. 

Proposition 2: For a compound memoryless channel M c = (X,y,S,P.(-\-),c(-)), in which X and 3^ 
are discrete and finite sets, the CPUC is 

C c = sup inf T ^ Y \ (2) 

!3>o,r x P s (-\-):seS [3 
E[c(X)]</3 
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The proof of Proposition [2] is in Appendix IVII-AI It essentially follows the proof of Proposition [TJ with 
the classical result of compound channel capacity [4] appropriately utilized. 

III. Orthogonal Coding Scheme for Compound Memoryless Channels 

In the sequel, we assume that there exists a unique symbol Xf with cost zero, i.e., c(xf) = 0, in the 
input alphabet X. For ergodic memoryless channels, the following proposition gives a particularly simple 
formula for the CPUC. 

Proposition 3: ([1, Theorem 3]) For an ergodic memoryless channel Mo = (X,y,P(-\-),c(-)) with a 

unique zero-cost input symbol Xf, the CPUC is 

_ D(P(y\x)\\P(y\x { )) 
C — sup — , (3) 

k &x > c(x) 

where X' := X — {xf} and -D(-||-) is the KL distance. 

In Section [V] we will establish analogous results for compound memoryless channels that satisfy a 
certain convexity condition. In this section, we focus on the performance of specific orthogonal coding 
schemes. An important observation from [1] is that a deterministic orthogonal code can be explicitly 
constructed which asymptotically achieves the CPUC in Proposition [3] as the coding block length grows 
large. 

Except in Section IV-CI we will consider a simple orthogonal coding scheme without mixed strategy. 
The codebook construction and the corresponding decoding procedure are described as follows (see Figure 
®. 

Codebook: 

Assume that the code consists of M equally probable messages and that the coding block length is 
n = Mh. Each codeword is virtually represented by an M x n two-dimensional block of channel uses. 
To represent message m, the n elements in the mth row, {X m j}™ =1 , take a symbol x c 6 X'; and all the 
other (M — 1) rows are all Xf. 
Channel transmission: 

After the transmission of a codeword, the receiver observes M mutually independent length-n random 
vectors. Conditioned upon the compound channel state realization s, exactly one of the M vectors 
consists of h independent and identically distributed (i.i.d.) random variables following the distribution 

P s ,c(v) '■= p s(y\x c ), and all the remaining following P Sjf (y) := P s (y|x f ). 
Decoding: 

Given the received M x h block of channel outputs, if the receiver correctly finds the index of the 
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n column? 

— > 

~1 
~1 
I 

I .■■■-rl: :v.v 

V i 

Fig. 1. Illustration of an orthogonal codeword. 

row of inputs x c we successfully decode the message, otherwise a decoding error occurs. For ergodic 
memoryless channels, a decoding algorithm based on Stein's lemma (see, e.g., [16, Chap. 12, Sec. 8]) 
has been utilized in [1] to asymptotically achieve Co as n — > oo. The key component of that algorithm's 
implementation is a thresholding operation with threshold t^ = D (P c (y)||Pf (y)) — £ where £ > is made 
arbitrarily small as n — > oo. For compound memoryless channels, such a threshold depends upon the 
channel state s, and thus the decoding algorithm is not applicable. Alternatively, we consider a decoding 
algorithm that computes a metric for each row of the received block and declares the decoded message 
as the row index with the maximum metric. Such an algorithm has been utilized in [8] for the particular 
case of fading Gaussian channels with energy detection, yielding an ARPUC identical to the CPUC of 
a Gaussian channel without fading, for any fading distributions with an identical second moment. 

We now describe our decoding algorithm in detail. For computing the decoding metrics, the receiver 
first transforms each received channel output Y m j into g{y m ,i), m = 1,...,M, i = l,...,n, where 
g(-) is an arbitrary real function that is measurable with respect to the probability measures P St {(y) and 
P SjC (y), and satisfies Ep s C [<?(Y)] < oo, for all s 6 5. The receiver then computes the decoding metrics 
as 

fl 

T m :=-- y>(Y mii ), m=l,...,M. (4) 



m i 
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The decoding rule is to declare the decoded message rh as that which maximizes the M decoding metrics, 
i.e., 

rh = arg max T m . (5) 

m=l,...,M 

If there exists a tie, then an error is declared. 

For the described orthogonal coding scheme, its ARPUC is given by the following proposition. 
Proposition 4: For a compound memoryless channel M c = (X, y,S, P. (■]■), c(-)), if: 

(a) the input alphabet has a unique zero-cost symbol Xf; 

(b) an orthogonal code is used with the nonzero-cost input symbol x c and with the transformation 
function <?(•), such that condition 

mf{Ep a , c b(Y)]-E Ps ,[ 5 (Y)]}>0 (6) 

is satisfied, 
then the ARPUC is 

R c (x c , 5 ) = inf sup -L- . {0E Ps Mn ~ logEp, t [exp(^(Y))]} . (7) 
s e5 e > c(x c ) 

If condition (O is violated, then the ARPUC is zero. 
Proof. The decoding error probability is equal to 



Pr 



Ti < max T„ 

m=2,...,M 



where for Tj, the channel outputs {Yi j}" =1 are distributed as P s c (y), and for J mj ti the channel outputs 
{Y m ,i}2=i are distributed as P 3 j (y). Now let us examine the conditions for the decoding error probability 
to vanish as n goes large. 

A prerequisite is that, the expectation of Ti should be strictly greater than that of T m ^i for all possible 
channel state realizations, leading to 

inf {E Ps c [g(n ~ Ep f [ff(Y)]} > 0. (8) 

Then let us fix a channel state realization s. Since Ti is the empirical mean of n i.i.d. random variables, 
by the weak law of large numbers, 

lim Pr[|Ti-E P . c [5(Y)]| < ^] =1 (9) 

for any £ > 0. So for arbitrarily small e > and £ > 0, there exists a positive integer n\ such that for 
every n > ni, 

Pr[Ti<Ep.>(Y)]-e] <|. (10) 
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For max m= 2,...,M T m , the cumulative distribution function is bounded by 

i M-\ 



Pr 



max T m < t 



m=2,...,M 



= {l-Pr[T 2 yt}} 1 
> 1 - M • Pr [T 2 > t] . 
The tail probability Pr [T2 > t] can be further upper bounded by the Chernoff bound as 



(11) 



(12) 



Hence we have 



Pr [T 2 >t]< exp ^ -n • sup [0t - logEp f [exp(0#(Y))]l 



Pr max T m > E Ps c [<?(Y)] - £ 

m=2,...,M 

= 1-Pr max T m < E Ps c [ 5 (Y)] - f 

m=2,...,M 

< M-Pr[T 2 >E P ^[ 5 (Y)]-e] 

< M-exp|-fi-sup[0(Ep s J 5 (Y)]-C)-logEp 3 Jexp(^(Y)^ (13) 

which can be made no greater than e/2 for every h > n 2 where n 2 is a sufficiently large integer, if the 
growth of M satisfies 

logM 



n 



sup [d(E Ps c [g(Y)} - " !og E ^ f [exp(0g(Y))]] - £. 

6»>0 



(14) 



To complete the proof, we let e — > and ^ — > 0, and take infimum of the right hand side of (fl4l) over 
all possible s G 5. Then the ARPUC © is achievable as n —> 00. Q.E.D. 

• For ergodic memoryless channels, by letting g(Y) = log[P c (Y)/Pf (Y)] and 9 = 1, the ARPUC © 
becomes 



Rn 



1 



c(x c ) 



Ep c 



log 



^f(Y) 



c(x c ) 



(15) 



By optimizing over all possible nonzero-cost input symbols x c G A", we revisit the achievability of 
Co in Proposition [3] We note that to achieve Ro, either thresholding ([1]) or maximum-seeking as 
described above suffices. 

As can be seen from its proof, Proposition [4] is not based upon Proposition [2 and thus does not 
require the alphabets X,y be discrete or finite. In Section ITVl we indeed evaluate the ARPUC for 
several channels with continuous channel outputs. 
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• Proposition [4] only establishes the achievability of the ARPUC. In fact, by utilizing Cramer's theorem 
(see, e.g., [17, Theorem 2.2.3]), we can show that R c (x c ,g) is also the maximum ARPUC for the 
orthogonal coding scheme with given x c and g(-), as shown in Appendix IVII-BI 

• In the orthogonal coding scheme, we can optimize over all possible g(-) and x c £ X' , to maximize 
R c (x c , g) in Proposition 01 We will systematically perform the optimization in Section|V]for channels 
whose uncertainty satisfies a convexity property. 



In this section, we illustrate the utility of Proposition |4] through a series of exemplar applications. We 
consider several representative receivers, applied to several practically important channel models. Hence 
these case studies are by no means purely academic exercises, but shed light on the behavior of certain 
communication systems under channel uncertainties. These case studies include linear and quadratic 
receivers for linear additive-noise channels, quadratic receivers for non-coherent fading channels, and 
photon-counting receivers for Poisson channels. We observe that, under various situations, Proposition |4] 
provides a unified approach to evaluating and optimizing performance of these classes of receiver-channel 
pairs. 

A. Linear and Quadratic Receivers for Linear Additive-Noise Channels 
Consider the discrete-time memoryless additive-noise channel 



where all the quantities are real-valued. The input X is n t -dimensional, the output Y and the additive 
noise Z are n r -dimensional, and the deterministic channel transfer matrix H_ is n t x n r . For simplicity, 
we assume that Z is independent of X and has mean zero. Such a channel model encompasses multi- 
dimensional Gaussian channels (e.g., [18], [19]), additive-noise channels with non-Gaussian noises (e.g., 
[20]), channels with intersymbol interference (ISI) or multipath (e.g., [21]). 

A linear receiver combines components of the channel output vector, and may be readily implemented 
by a linear filter. It possesses the following structure, 



where w_i is a deterministic n r -dimensional combining vector. Alternatively, a general quadratic receiver 
extracts signal energy, but ignores the phase information. It possesses the following structure, 



IV. Evaluation of ARPUC for Several Applications 



I = H L X + Z 



(16) 



9i(X)=m 




(17) 



g q QQ = l r w q X, 



(18) 
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where W q is a deterministic positive semi-definite matrix. 

For the linear additive-noise channel, we define the cost function as the energy of an input, so that 

c(X) := ^ |X,| 2 = ||X|| 2 , (19) 
i=l 

and the zero-cost symbol is Xf = 0. The channel has two states with uncertainty. First, the channel 
transfer matrix H_ is arbitrarily drawn from a set TL. Second, the distribution of the additive noise, Pz, 
is arbitrarily drawn from a set of distributions, Vj_. So the channel state is denoted by s = (H_,Pz)- 

Applying Proposition @1 we can evaluate the ARPUC for linear and quadratic receivers, as follows, 
noting that x c is the nonzero-cost input symbol (vector) in the orthogonal coding scheme. 

Linear receivers : 

Refect) = mf sup ir ^{0(Hw l ) T x c - logE[exp(gw^Z)l} , (20) 

se5 (9>0 ll*cll 

if inf ( Hw i^x^ > 0; and zero otherwise. 

Quadratic receivers : 

R-c(x c > fq) = inf sup -r—^ {9x? HW^ H T x„ + 0E[Z T PFqZ] - log E[exp(0Z T V£ q Z)] } , (21) 

se5 0>O ll^cll 

if inf xj HW H t x„ > 0; and zero otherwise. 

For concreteness, we examine the following specific examples in more detail. 

Example 1: (Linear Receivers for Gaussian Noise Channels with Partially Unknown Covariance) Let 
n t = n r , TL = {I}, and Z ~ M(0, with positive-semidefinite covariance matrix G 5$. The capacity 
of such a channel has been addressed as a special case of compound linear Gaussian channels in [22]. 
Here we focus on its ARPUC and CPUC, and as will be shown, under a specific convexity condition, the 
CPUC can be achieved by the orthogonal coding scheme with a linear receiver. Another related problem 
has been addressed in [23] (also see references therein), where the focus is on finding the worst noise 
distribution under a given covariance structure. Here in contrast, we focus on the case where the noise 
is Gaussian, but its covariance matrix is not perfectly identified, due to practical issues such as limited 
channel training or time-variations. 

We consider linear receivers. The ARPUC can be shown to be 

(w T x ) 2 

R c (x u,) = inf - ^- 1 - c x , „ , (22) 
if w^x c > 0; and zero otherwise. 
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From (1221 ). we can proceed further to optimize the ARPUC over all feasible x c and w.i- From the 
Cauchy-Schwartz inequality, we have 



Rcfeoffil) < in| ./ t 1 " v (23) 
where equality is achieved if and only if w.i is proportional to x c . Therefore, the optimal ARPUC R c is 

R c = sup inf T — . (24) 

11^11=1*65* 2w{ ^Wj 

Assume that the uncertainty set 5$ is compact and convex, and that max$ e 5 $ A m i n ($) is lower bounded 
away from zero, where A m j n ($) denotes the minimum eigenvalue of <£. We can utilize results of minimax 
robustness to find that d24b has a saddle point, so that 

R c = — : r^- (25) 

2max tgSj .A min ($) 

We show the derivation of ( 1251 ) in Appendix IVII-Ci We note that d25l ) coincides with the CPUC under 
the ideal assumption that the transmitter and the receiver both have perfect knowledge of the realization 
of <£. Therefore in this example we have shown that, for compact and convex 5$, linear receivers indeed 
achieve the CPUC of Gaussian noise channels with partially unknown covariance. 
Example 2: (Quadratic Receivers for Multipath Channels) 

Let rtt = 1, Z € M(0,£), and H_*ETt models channel multipath. In many sparse multipath channels, 
H may have a dimension of several tens or higher with mostly zero elements, e.g., [24], [25], [26]. For 
those situations, linear receivers like the RAKE receiver (e.g., [21]) can be sensitive to channel estimation 
errors. Alternatively, quadratic receivers may be employed to detect signal energy. By Proposition HJ for 
quadratic receivers the ARPUC can be shown as 

Rc(x c , = inf sup ^ |flx?(W q ff T ) + 0tr[VF q ] + \ log det(J - 20WJ ) , (26) 

with the condition inf Heni K W^ H 1 -) > satisfied, where 8 is bounded such that det(J — 26W jq ) > 0. 
If W^q = L (T261 ) further becomes 

= - 2^ l0g (1 + IISI|2x < 2/ "')} ' <2?) 

It is easily seen that (T27T ) is maximized by letting |x c | — ► 00, as 

II TT||2 

R c = inf i!=!L, (28) 
Hen 2 ' 

which is also the CPUC under the ideal assumption that the transmitter and the receiver both have perfect 
knowledge of the realization of H_ . 



2 
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On the other hand, if there is a finite peak limit for x c , then as the channel dimension n r — > oo, (127T ) 
becomes zero for any \\H_\\ 2 < oo. Intuitively, this is because as the channel dimension increases, the 
quadratic receiver tends to collect more noise than useful signals. In practice, a multipath channel often 
has distinct multipath delay profiles from realization to realization (e.g., [27]), so robust receivers designed 
here tend to yield rather conservative performance for most channel realizations, because they need to 
cope with channels with extremely long delay spreads, which occur with a rather small probability. A 
more plausible alternative may be an outage-based approach, which is beyond the scope of this paper. 

Example 3: (Scalar Non-Gaussian Noise Channels) 

Let n t = n r = I, TL = {1}, and the zero-mean additive noise Z have an unknown probability 
density function Pz G Vz with a common variance var(Z) < oo. Without loss of generality, we can let 
w \ = Wq = 1, and the corresponding ARPUCs are 

R c (x c , 1) = inf sup {9x c — log E [exp(#Z)]} , for linear receivers; (29) 

p z 8>0 x c 



and 



R c (x c , 1) = inf sup — ^ \9(x 2 + var(Z)) — logE[exp(#Z 2 )]} , for quadratic receivers. (30) 



For linear receivers, if we let x c — » and 9 = x c /var(Z) in (1291) . we get 
limR r (x c ,l) > lim inf 1 logE [exp(x c Z/var(Z))] 



Xc 



,2 



' x c ^o P z [ var(Z) X; 

1 supp logE [exp(x c Z/var(Z))] 

lim 



var(Z) x c ^o x^ 

1 , sup Pz log E [1 + x c Z/var(Z) + (1/2) • x;?Z7var(Z) 2 + o(x 2 c )} 
lim 



var(Z) x^+o x% 

1 sup Pz log[l + (1/2) • x2/var(Z) + o(x 2 c )} 
= — - hm 5 

var(Z) x c ^o xj 

1 1 1 

= = (31) 

var(Z) 2-var(Z) 2-var(Z)' V ' 

independent of the actual noise distribution, and identical to the CPUC of Gaussian channels whose 

noise variance is var(Z). The lower bound (I3TT) may be interpreted as an indication that Gaussian noise 

is the worst one under a given variance constraint. It is interesting that a simple linear receiver suffices 

to provide such a performance guarantee. 

In contrast, for quadratic receivers, there is no performance guarantee as (l3Tb . We can construct 

impulsive noise distributions such that the resulting ARPUC is arbitrarily close to zero. In fact, consider 
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the mixed Gaussian distribution 

1 



P z (z) = (1 - e)^e" z2 + e _= e -* 2 M 



where A = (1 + e)/e > 1 so var(Z) = 1 for any e G (0, 1). The ARPUC (O becomes (see Appendix 
IVII-DI) 

R c (x c , 1) = sup \ {e(x 2 c + 1) + log VT^e - log [l + e(V(l-0)/(l-04) - 1)1 ) , (32) 

6»>0 x c L L J J 

for < < 1/A Therefore we can upper bound R c (x c , 1), upon noticing that the last two logarithmic 
terms in d32l) are both negative, by 

Rc(xc,i) < sup \e( l + \ 

o<e<i/A IV x c 

as A — > oo, for arbitrary x c ^ 0. Intuitively, the quadratic receiver cannot attain robust performance 
against impulsive noise because it tends to "amplify" peaks in noise, which occur with a relatively high 
frequency for impulsive noise. 

B. Quadratic Receivers for Non-Coherent Fading Channels 
Consider the discrete-time memoryless fading channel 

Y = H+X + Z, (33) 

where all the symbols are complex-valued. The input X is n t -dimensional, the output Y and the additive 
noise Z are n r -dimensional, and the fading matrix H is n t x n r . We assume that the noise is circularly 
symmetric complex Gaussian, i.e., Z ~ CAf(0,I_), and that the random fading matrix H is independent 
of Z and has a distribution belonging to an uncertainty set, V\\. When neither the transmitter nor the 
receiver has knowledge of the realizations of H, it is customary to employ quadratic receivers to process 
the channel output, e.g., [28], as 

9(1) = X+GY, (34) 

where G is a deterministic positive semi-definite matrix. We define the cost function as the energy of an 
input, so that 

c(X) :=^|Xi| 2 = ||X|| 2 , (35) 
i=i 

and the zero-cost symbol is Xf = 0. 
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Following Proposition HJ the ARPUC can be shown to be 

x c + logdet(/-#G)} , (36) 

with the condition inf p H ^p H xJe [HGH n Xc > satisfied. Note that 6 is bounded such that det(7— 9G) > 
0. 

Example 4: (The Case of a Single Antenna at Either the Transmit or the Receive Side) 
Of particular simplicity is the case where the ARPUC only depends upon the covariance matrix of 
H. This property can hold when either the transmit side or the receive side has dimension one. In the 
multiple-input-single-output (MISO) case, there is no loss of generality by letting G = 1, and the ARPUC 
can be shown to be 

R^ IS °(2c c , 1) = p mf h {2cjE(HHt)x c - log (l +x£E(HH t )x c ) } . (37) 

In the single-input-multiple-output (SIMO) case, the ARPUC becomes 

R5 MO (x c , G) = inf sup — (tr[E(HtH)G]|x c | 2 6> + tr[G]0 + logdetf/ - 0G)\ . (38) 
Furthermore, if E(H^H) = $ for every Ph G ^"h> we can simplify (l38l) to 

Rc IMO (x c , G) = sup ^ {tr[$G]|x c | 2 ^ + ti[G}9 + logdetf/ - 6G)\ . (39) 
Maximizing d39l ) with respect to 6 and G, we find that the optimal G is 

G* = ^(I + \x c \ 2 $r 1 \x c \ 2 $, (40) 

and that the maximum ARPUC is 

Hr°(xc,<?) = *® - fgi^O+NM , ( 4l) 

|Xc I 

which is identical to the CPUC when the fading matrix H is circularly complex Gaussian and the channel 
input X has a peak constraint |x c |. The results of [8], that orthogonal codes with energy detection achieve 
the CPUC for general fading Gaussian channels, correspond to a special case treated here, where both 
the input and the output are scalar, and the peak constraint |x c | < 00 is removed. 
Example 5: (Uniform Diagonal Quadratic Receiver) 

In this example, we restrict the weighing matrix G to be an identity matrix / n xn . That is, the 
channel output processing is simply the sum of the squared magnitudes from individual receive antennas, 
but without any correlation among them. Such a structure is thus immune to possible phase offsets among 
receive antennas, and is easy to implement in antenna arrays. 



Rc(x c ,G) 



inf sup T . — 

PuP'Pu a^n X„ 



hr[G]e 



HGH f 
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The ARPUC d36j now becomes 



It is apparent that the ARPUC only depends upon the fading covariance E[HH f l. If there is no peak 




constraint on ||x c || 2 > the maximum ARPUC is achieved by letting ||x c || 2 — ► oo, and given by the following 
optimization problem: 



By utilizing the results of minimax robustness in Appendix IVII-El as done in Example [T] we can find 
that, when the fading distribution uncertainty set Vu is compact and convex, the optimization (l43l) yields 



C. Photon-Counting Receivers for Poisson Channels 

We consider direct detection photon channels in which the channel observation can be modeled as 
point processes. For such channels without uncertainty, the capacity and error exponents have been fully 
identified; see [29], [30], [31]. For technical convenience, in this paper we adopt the discrete-time channel 
model as described in [1, Example 2], and concentrate on the case where the observation is a Poisson 
process with a fixed background photon flow rate, and that the channel input has no bandwidth constraint. 
For such a channel model, each "channel use" corresponds to a length-To time duration, in which the 
transmitter modulates its input as a function p(t), t G [0, To] such that the output is a Poisson point 
process with rate p(t) + po, where po is the background photon flow rate. Therefore, an ideal photon- 
counting device within a channel use duration can detect a random number of photons with this number 
following a Poisson distribution of parameter ^ j^ p{t)dt + poToj. The cost associated with a channel 
input is equal to p(t)dt, therefore the zero-cost input symbol is p(t) = 0, t G [0,To]. Let us model 
the channel state by allowing the fixed background photon flow rate to be uncertain, po G [po,~p~o\. In 
a photon-counting receiver, the channel output processing function g(-) is nothing but the number of 
photons detected by the ideal photon-counting device within a channel use duration. 

Following Proposition 01 we obtain that for any nonzero-cost input p c (t), t G [0, To], the ARPUC for 



R c = sup inf xlE[HH f 1x, 

*A* c \\ 2 =i p — eV — 



(43) 



a saddle point, and its solution is R c = mmp H(E -p H A max (E[HH^]). 
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a photon-counting receiver is 

R c (p c (t)) = mfsup^{#(A + p )-Me e -l)) 

= inf -r- {(A + /5 ) [log(A + p ) - log p ] - A} 
po A 

= l{(A + po)[log(A + po)-logpo]-A}, (44) 

where A := (1/To) J Q T ° p c {t)dt. Since the right hand side of ((44)) is monotonically increasing with A, 
if we have a peak rate constraint p(t) < p\, the optimal nonzero-cost input function is p c {t) = p\ for 
t S [0,Tn] and thus A = p\. Consequently the photon-counting receiver achieves 

Rc = (1 + Pol pi) log (l + pi/po) - 1. (45) 



Comparing [1, Example 2], the ARPUC (1451 ) coincides with the CPUC when the channel state is p = pb~, 
the most noisy one. So a photon-counting receiver indeed achieves the CPUC of the compound Poisson 
channel. 

However, we note that, the photon-counting receiver may not attain robustness against non-Poisson 
point processes. Analogously to the impulsive additive-noise channels considered in Example |3j it is 
possible to construct impulsive point processes such that the resulting ARPUC with photon-counting 
receivers is arbitrarily close to zero. Consider a photon channel in which the background photon flow 
yields only two possible outputs at the ideal photon-counting device. Within a channel use, the device 
either detects A photons with probability Ao/^4, or detects no photon, with probability (1 — Xq/A). The 
integer parameter A > may be made arbitrarily large. Note that the average rate of the background 
photon flow is Ao- 

Following Proposition 01 we obtain that the corresponding ARPUC is 



Rc = sup — i 0(A + Ao) — log 



A ' 



(46) 



e>o A 

Inspecting d46l ), we notice that for any 9 > 0, as A — > oo, the logarithmic term will eventually exceed the 
preceding term 9 (A + Ao). Therefore as A — > oo, we have to let 9 — > 0, and consequently the ARPUC 
vanishes asymptotically. 

V. CPUC Bounds for Channels with Convex Uncertainty Structure 

In this section, we optimize the ARPUC in Proposition |4] to quantify the ultimate performance of 
orthogonal codes. The optimization is analytically tractable when the uncertainty set of channel transition 
statistics satisfies a convexity property, by utilizing results of minimax robustness developed in [12]. To 



April 27, 2009 



DRAFT 



19 



aid the reader, we briefly recapitulate the notations and basic results of minimax robustness in Appendix 



To avoid additional technicalities, throughout this section we focus on discrete and finite alphabets X 
and y, and assume that the cost function c(x) is strictly positive and finite for every x G X'. We note 
that sometimes it may be possible to extend the results to more general alphabets, while caution should 
be exercised to actually verify the corresponding technical conditions. 

The following convexity property will be critical in this section. 

Definition 3: (Convexity Property of Channel Uncertainty) For any two channel states s\ ^ S2 in S and 
an arbitrary a G [0, 1], there exists another state s a G S, such that P Sa (y\x) = aP Sl (y |x)+(l— a)P S2 (y |x) 
for every pair of (x, y ) G X x y. 

By this definition and the finiteness of the alphabet y, an immediate observation is that, for every 
x € X, the set of conditional channel output distributions, V x '■= {P s (-\x) : s G S}, is compact and 
convex on S. 

A. Some Information-Theoretic Results 

Before proceeding with the ARPUC formula in Proposition [4] it is useful to conduct an information- 
theoretic analysis from the general CPUC formula of Proposition [2] For the capacity game of compound 
channels, the convexity property we have assumed is sufficient to guarantee the existence of a saddle 
point, and consequently the capacity can be achieved by using a maximum-likelihood decoder designed 
for the saddle point channel realization [32], [11], [7]. For the CPUC game of (O in Proposition [H we 
can also establish a minimax result, as given by the following proposition. 

Proposition 5: For a compound memoryless channel M c = (X,y,S,P.(-\-),c(-)), in which X and y 
are discrete and finite sets, consider the following two conditions: 

(a) there exists a unique zero-cost symbol Xf in the input alphabet X; 

(b) the convexity property in Definition [3] 
If (a) is satisfied, then the CPUC is 



IVII-EI 




(47) 



If (b) is satisfied, then the CPUC is 



C c 



mm sup 

P„(-|-):s6S 3>o,p x 

E[c(X)]</3 



(48) 



April 27, 2009 



DRAFT 



20 



Furthermore, if both (a) and (b) are satisfied, then we have 

. D(P a>c (y)\\P Sit (y)) 
C c = mm sup — r . (49) 

seS c(x c ) 

Part of the proof of Proposition [5] relies on a general minimax theorem in [33]. We state the theorem 
in its following form taken from [34, Sec. 5]: 

Theorem 1: Let F be a convex subset of a linear topological space T, Q be a compact convex subset 
of a linear topological space Q, and U : F x Q — > R be upper semi-continuous on F and lower semi- 
continuous on Q. Suppose that, (a) for all q G Q and A G R, the level set GE(X,q) :={/:/ G 
i 7 , £7(/, q) > A} is convex; and, (b) for all / G F and A G R, the level set ZJ5(/, A) := {g : ? £ 
<5, g) < A} is convex. Then 

min sup U(f, q) = sup min U(f, q) . (50) 

qeQ f^F fizFI^Q 

In Theorem[U the property (a) is called quasiconcavity, and the property (b) is called quasiconvexity. These 
are generalizations of the conventional notions of concavity and convexity, respectively. For compound 
channel capacity problems, such generalizations are not necessary. However, they are required here for 
establishing the minimax result for the compound channel CPUC problem. 
Proof of Proposition |3} 

Proof of rt?7l ).- We first prove that for any r(x) : r(x) > 0, YlxeX' r ( x ) = 1> 

mm > r(x) 

is an ARPUC. For this purpose, we start with the general formula ([2]) and expand I(X;Y) as [1, Eqn. 
(10)] 

/(X;Y) = Px{x)-D{P e {y\x)\\P a (y\xi))-D(P Y {y)\\P a (y\x i )). (51) 

As we let -Px(x) - ► for all x G X' , the average cost /? — > since c(x) < cxd, and the last term in the 
right hand side of (f5TT > vanishes like o(/3), following the achievability proof of [1, Theorem 3]. Hence 
we have 

^ -* E^wI ftW ■olPM-W.l.vM) (52) 

asymptotically. Then the achievability readily follows as we choose Px( x ) to satisfy 

r(x)= *M;*> , (53, 
EieA" p xW • c(x) 

for every x G X' . 
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We then prove that the CPUC expression in (|47T ) cannot be exceeded by any input distributions. For 
this purpose, we simply need to upper bound /(X; Y) by 

/(X; Y) < p x(x) • D (P s (y|x)||P,(y|x f )) , 

xex< 

according to (I5TI ). and the remaining part is analogous to the converse proof of [1, Theorem 3]. 
Proof of d?§|). - We shall apply Theorem Q] Let us start with the general formula Q of C c , 

n - f /(X;Y) 
C c = sup mi — 

/3>o,p x P s (-\-):seS p 

E[c(X)]</3 

J(X" Y) 

= sup inf ' (54) 

Px P.(.|.):«£5 E[c(X)] 

The set of input distributions is clearly convex, and by the convexity property in assumption, the set 
of channel transition distributions is compact and convex. Consider the continuity conditions. By the 
assumption that c(x) < oo is bounded away from zero for all x £ X', I(X; Y)/E[c(X)] is con- 
tinuous in Px for any fixed s E S. By expanding the channel mutual information as /(X; Y) = 
Y^^ex P{ X )D (-P s (y|x)|| Y^ xeX P(x)P a (y\x)), we have that 7(X; Y)/E[c(X)] is lower semi-continuous 
in P s (-|-) [2], for any fixed Px- Finally consider the level sets. For fixed Px an d 0, I(X;Y)//3 is convex 
in P s (-\-) [16]. For fixed s 6 5, we need to prove that the corresponding level set, GE(X,s) = {[5 > 
0,P X : I(X;Y)//3 > A}, is convex. Fix (/3i,P x 1} ) and (p2,P x 2) ) that belong to GE(X,s). Consider 

(3 a = a(3i + (1 - a)/3 2 , 

PW = ai£> + (l-«)p£ 2 \ 

for an arbitrary a G [0,1]. Then we have 

J pM (X; Y) of (l) (X; Y) + (1 - a)/ p(2) (X; Y) 

X ^ X X 

aA/3i + (1 - a)A/3 2 _ , 

where the first inequality is from the concavity property of mutual information in input distributions [16], 
and the second inequality is from the definition of GE(X,s). In summary, we have utilized Theorem Q] 
to establish the minimax equality (|48T ). 

Proof of j[49\> : Since we have established (|48T ). the relationship d49l ) directly follows from applying 
Proposition [3] Q.E.D. 

In the following subsections, we return to the orthogonal coding scheme, to make comparison between 
its ultimate performance and the CPUC d49l ). 
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B. Maximum ARPUC of the Orthogonal Coding Scheme 

We start with the special case in which min se< 5 D(P s>c (y)\\P s f(y)) = oo. Since the output alphabet 
y is finite, we can assert that, there exists an output y G y, such that P s (y\x c ) > and p s (y\x{) = for 
all sgS. Therefore if we let the transformation function be g(y) = 1 if y = y, and g(y) = otherwise, 
the orthogonal coding scheme can achieve any arbitrarily large ARPUC. So in this case we have that the 
CPUC is C c = oo. 

Now let us turn to the case where min^s D(P SjC (y)\\P s f (y)) < oo. For the ARPUC formula in 
Proposition HI by formally taking its supremum over x c G X' and g(-), we arrive at 

R c = sup mf S up-i-{0E Psc [ 5 (Y)]-logEp st {exp[^(Y)]}}. (55) 

Since we have assumed the convexity of the uncertainty sets "P Xc and "P Xf , we can utilize the minimax 
robustness results in [12] to verify that the order of sup 9 (.) and inf sg s is interchangeable, as shown in 
Appendix IVII-Fj Thus, 

R c = sup mm sup — L {6E Ps c [ 5 (Y)] - logE Ps f {exp[^(Y)]}} . (56) 

x c eX' se5 6>0,g(-) c \ x c) 

For fixed x c and (P s f(y), P s , c (y)), the inner supremum operator of (l56l) yields the following solution, 

sup {0B Pa Mn -logEp„, f {exp[%(Y)]}} = D (P SiC (y)\\P s j(y)) , (57) 
e>o, g (-) 



achieved by 



5(y) = log§4^' and0 = 1 ' (58) 



as shown in Appendix IVII-GI 

So far we have not considered the condition ((6]) in Proposition HI which now becomes 

^,c(Y) 



inf <^ E P 

sG5 



l0g ^(Y) 



E 



Ps, 



log 



> 0, (59) 



i.e., inf {D(P SjC (y)\\P Sti (y))+D(P Sti (y)\\P SjC (y))} > 0, (60) 

and thus holds true whenever inf sS 5 D (P SiC (u)||P Si f(y)) > 0. 

In summary, we establish the maximum ARPUC as given by the following proposition. 

Proposition 6: For a compound memoryless channel M c = (X,y,S,P.(-\-),c(-)), in which X and y 
are discrete and finite sets, if: 

(a) there exists a unique zero-cost symbol Xf in the input alphabet; 

(b) M c satisfies the convexity property in Definition [3l 
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then the maximum ARPUC achievable by the orthogonal coding schemes described in Section [TTT] is the 
following lower bound to the CPUC: 

:= sup mm -. — r . (ol) 

Xc£A - sgS c(x c ) 

Remark: Let us denote the solution to the supinf game (l6lT ) by x* and (P f *,P*). To achieve C c in 
Proposition [6l the orthogonal code uses nonzero-cost input symbol x*, and the corresponding output 
transformation is the log-likelihood ratio processing g*(y) = log [P*(y)/P f *(y)]. 



The CPUC lower bound C c coincides with the CPUC d49J) when the game 

V c ( x c) / 

has an equilibrium. For such a purpose, it is possible to utilize either the various forms of minimax 
theorems (see, e.g., [34] and references therein), or minimax robustness results [12], depending upon the 
specific channel model encountered. 

A comment regarding the computation of C c follows. This KL-distance type CPUC lower bound is in 
principle simpler to compute than the general CPUC given in Proposition [2 However, its computation 
may still be numerically non-trivial due to the hindrance posed by the requirement of convexity. For 
practical purposes, a compound channel model is often described by a certain class of distributions with 
a few unknown parameters, for example, additive Gaussian noise with an unknown variance. Even if such 
unknown parameters are from convex sets, the resulting channel transition distributions typically are not 
so. Therefore, the requirement of convexity in Definition [3] usually can only be fulfilled by additional 
convex-hull operations (see, e.g., [35]) upon the original channel uncertainty sets, inevitably leading to 
mixed probability distributions, for which closed-form expressions for the KL distance rarely exist. 

Here we give an example to illustrate that the CPUC lower bound can be strictly smaller than the 
CPUC. 

Example 6: Consider a compound channel with three possible inputs {a, b, /} with costs c(a) = c(b) = 
1 and c(f) = (thus Xf = /). For each input symbol the possible outputs are {a',b'}. The conditional 
output distributions P(-\a) and P(-\b) are unique without any uncertainty, given by 

P(a'\a) = P(b'\b) = l-q, P(b'\a) = P(a\b) = q, 

where < q < 1/2 is a deterministic number. The conditional output distribution P(-\f) is contained 
by the convex hull of P(-\a) and P(-\b), represented by P(-\f) G {(P(a'\f), P(b'\f)) = (6,1 — 6) : 
q < 5 < 1 — q}. In view of C c , we see that for either x c = a or b, it is possible for Pf to coincide 
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with P c . Therefore we have C c = 0. On the other hand, for every possible realization of P(-\f), we 
can choose x c from {a, b} such that P c ^ P(-\f). Consequently, we can calculate from (|49l ) that C c = 
log 2 + glog q + (1 — q) log(l — q) > for < q < 1/2. Indeed it is easily seen that C c can be achieved 
by coding over {a, b} only, ignoring the existence of the free symbol /. 

C. Mixed Strategies Achieve the CPUC 

The reader may observe from (|47T ) in Proposition [5] as well as Example [6] that it may be insufficient to 
use orthogonal codes with only one value of nonzero-cost symbols, because possibly for each nonzero-cost 
symbol value, the channel uncertainty set has a small (or even vanishing) minimum KL distance to the 
uncertainty set of the zero-cost symbol. Potential performance improvement, therefore, can be available 
by extending the orthogonal coding scheme to include different values of nonzero-cost symbols, in hope 
that for any possible channel realization there exist some "good" nonzero-cost inputs. To this end, we 
introduce the mixed strategy as follows (see Figure [2]). 



ft columns 



I I ! ■ 



| m-th rsv; 



I 1 ■ 



□ A" 



Fig. 2. Illustration of an orthogonal codeword with mixed strategy. 



A mixed strategy is parametrized by a function r(x) > 0, x G X', satisfying J^xeA" r ( x ) = 1- Thus 
r(-) may be interpreted as a probability mass function on X' . For an orthogonal coding scheme that uses 
the mixed strategy, to represent message m, the h elements in the mth row of the codeword take \X'\ 
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possible values. Specifically, h ■ r(x) of the h elements take symbol value x, for every x € X^ All the 
(M — 1) rows other than the mth row are all Xf, as in the orthogonal coding scheme without mixed 
strategy described in Section |III] For notational convenience, we denote the set of column indexes which 
correspond to input symbol x by X x . It is obvious that UxeA'' -^x = {!> • • • > an d Hl x ' = for any 
x' x. The decoder knows the sets T.. 

The decoding algorithm is a slight modification of that in Section |III] For each x G A", the decoder 
chooses a processing transformation function g x (v) for the channel outputs whose column indexes in the 
orthogonal codeword block belong to Z x . The receiver then computes the decoding metrics as 

1 



Tm = ~ " 5x(Y m ,i)> m=l,...,M. (62) 



The decoding rule is to declare the decoded message m as m = argmax m= i ! ... ) M T m , the same as that 
in Section ITTTl 

Equipped with the convexity property in Definition [3j orthogonal codes with the mixed strategy actually 
can be sufficient to achieve the CPUC of compound memoryless channels, as established by the following 
proposition. 

Proposition 7: For a compound memoryless channel M c = (X,y, S, P. (■]■), c(-)), in which X and y 
are discrete and finite sets, if: 

(a) there exists a unique zero-cost symbol Xf in the input alphabet; 

(b) M c satisfies the convexity property in Definition [3l 

then the maximum ARPUC achievable by the orthogonal coding schemes with mixed strategy coincides 
with the CPUC as given by d49l 

Proof: Following the same line of steps in establishing Propositions [4] and |6l we can express the 
maximum ARPUC of orthogonal codes with mixed strategy as 

C := sup mm s . (63) 

Exga" r(x)=l 

The subsequent proof again relies on Theorem [T] First, the mixed strategies r (•) clearly constitute a convex 
set, and by the convexity property in assumption, the set of channel transition distributions V x is compact 
and convex. Second, the payoff function in (l63l is continuous in r(-), and lower semi-continuous on the 
set of channel transition distributions [2]. Finally, for a fixed mixed strategy r(-), the payoff function 

3 Without loss of generality, by letting n be sufficiently large, we assume that h ■ r(x) is integer-valued for all x G X' . 
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is convex in the channel transition distributions [16]. On the other hand, for a fixed channel realization 
s£5, the corresponding level set, 



as given by Proposition [5] Q.E.D. 

Example \6\ (cont.): Now let us consider using mixed strategies for the channel in Example [6] It can 
be verified that the solution to ([63]) is r*(a) = r*(b) = 1/2, (P* (a'\f), P* (b'\f)) = (1/2,1/2), and 



Therefore, by mixing up two nonzero-cost input symbols, we can achieve the CPUC of this compound 
channel. 



For ergodic memoryless channels, when a zero-cost symbol exists in the input alphabet, an orthogonal 
code is sufficient to asymptotically achieve the CPUC. For compound memoryless channels, however, 
the ignorance of channel state realization generally prevents the orthogonal coding scheme from being 
optimal. By extending the orthogonal decoding algorithm for ergodic memoryless channels, we obtain a 
class of ARPUC for compound memoryless channels. The utility of this class of ARPUC is illustrated 
by analyzing several practical receivers for several representative channels. In this paper, we specifically 
study linear and quadratic receivers for linear additive-noise channels, quadratic receivers for non-coherent 
fading channels, and photon-counting receivers for Poisson channels. 

Under the condition that the uncertainty set of channel transition statistics satisfies a certain convex 
property, we systematically optimize the performance of orthogonal codes to obtain a lower bound to the 
CPUC which involves the KL distance between two conditional output distributions, as well as a minimax 
game between selecting the optimal nonzero-cost input symbol and selecting the least favorable channel 
state realization. The CPUC lower bound achieved by an orthogonal code without mixed strategy is tight 
if the minimax game has an equilibrium. Moreover, we propose a way to extend the orthogonal coding 



I xeA" 

is obviously a convex set. Therefore by utilizing Theorem [T] we have 






log 2 + qlogq + (1 — (?) log(l — q), which is the same as the CPUC C c obtained earlier. 



VI. Conclusions 
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scheme, by allowing a mixed strategy in which an orthogonal code contains an appropriate composition 
of different nonzero-cost input symbols. Such a mixed strategy improves the ARPUC of orthogonal codes 
without mixed strategy, and its optimization actually leads to the CPUC, under the convexity condition. 

In closing, we briefly comment on some open issues unaddressed in this paper. First, regarding the 
results obtained, it would be desirable to lift some of the technical conditions for further generality. 
Specifically, the assumption of convexity that we utilize in Section [V] seems to be a crucial prerequisite, 
but as commented, it also considerably limits the applicability of the resulting ARPUC and CPUC. 
For example, the results in Section [V] cannot replace the ad hoc analysis in Section [IVJ because those 
channel models examined there generally do not satisfy the convexity condition. Second, as we view 
the development in this paper as an initial step toward a full understanding of the robustness issue 
for wideband communication systems, it would be useful to examine more bandwidth-efficient coding 
schemes other than orthogonal codes. Despite its simplicity, the orthogonal coding scheme suffers from 
the slow (sub-exponential) growth rate of the number of messages with the coding block length, implying 
its extremely low bandwidth efficiency. To this end, it will be of considerable interest to examine and 
compare the wideband slopes [3] of different coding schemes for compound channels. 

VII. Appendix 

A. Proof of Proposition |2] 

We first show the achievability of C c . The compound channel coding theorem [4, Theorem 1] states 
that, for every (3 > and every < e < 1, the rate C c (f3) := supp x:E r c ( X )] <( g infp s (.|.). se s /(X; Y) is 
e-achievable under average cost f3. Therefore, for every fixed 7 > 0, there exists np such that if n > np, 
then an (n, M, n(5, e) code can be found with 

!^ > CM - (64) 
n 2 

The subsequent arguments of the achievability then directly follow [1, Theorem 2]. 

We then show the converse of C c . From [4, Lemma 6], we obtain that for every possible channel 
realization s G S, every (n, M, u, e) code should satisfy 

io g M £ 1 r» 10521 (65) 



v 1 — e [ v v 
Considering the least favorable channel realization, we have 



!^<_i_/H bf ,(X;Y) + !^. (66) 
v 1 - e [ v p s (-\-)-.seS v ' 
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Furthermore, optimizing the distribution of X under the average cost constraint like in [1, proof of 
Theorem 2], we obtain 

log2l 



logM 1 1 

< < sup — sup 



inf J(X;Y) + 



v 1-e [p>o P XtE[c(X)]<i3Ps(-\-y.seS v j 

So if R c is e-achievable per unit cost, then for every 7 > 0, there exists u such that for any v > u 



(67) 



R c - 7 < 



1 



1 - e 



sup mf 1 

i3>o,p x P 3 (-\-)-.seS p v 

[E[c(X)]</3 



(68) 



By letting 7 — » 0, e —> 0, and v — ► 00, we have 



ti< < sup ml — - — , 

0>o,p x P,(-\-).seS (3 
E[c(X)]</3 



(69) 



and thus the converse is established. 



B. On the Tightness of the ARPUC in Proposition^ 

In this appendix we show that the ARPUC in Proposition @] is also the maximum ARPUC for the 
given orthogonal coding scheme. To this end, it suffices to show that for every channel state s 6 5, the 
decoding algorithm cannot achieve arbitrarily small error probability if the growth of M satisfies 
logM 



7? 



sup [6E P Jg(Y)} - logEp. f [exp(0 5 (Y))]] + 5, 

9>0 



(70) 



as n — ► 00, for any arbitrarily small 5 > 0. 

For max m= 2,... i M T m , consider the following lower bound as 



Pr 
1 - Pr 



max T m > Ep sc [#(Y)] 

m=2,...,M 



max T m < E Psc [g(Y)] 

m=2,...,M 



, M-l 



= l-{l-Pr[T 2 >E Psc b(Y)]]} J 
> 1 — exp {—(M — l)Pr [T 2 > E Ps Jg(Y)}] } , 



(71) 



where the inequality is from (1 — t) M 1 < exp[— (M — l)t] for t G [0, 1]. By Cramer's theorem (see, 
e.g., [17, Theorem 2.2.3]), the tail probability Pr [T2 > Ep. c [^(Y)]] scales as 

liminf^i logPr [T 2 > Ep a J 5 (Y)]] > - inf sup {9t - logB Ps f [exp(0 5 (Y))]} . (72) 

n ' t:t>E Ps c [g(Y)] e e ]R 
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Under the additional technical condition that logEp. t [ex.p(9g(Y))] < oo for some 9 > 0, the bound d72l 
further reduces into (see, e.g., [17, Lemma 2.2.5]) 

UmimV+ooi logPr [T 2 > Ep„ „ fo(Y)]] > - sup {0E P , „[g(Y)] - logE Ps t [exp(^(Y))]} . (73) 

Therefore, if the growth of M satisfies d70l ) for any 5 > 0, from ( [711 ) and d73l we have 



Pr 



max T m > B Psc [g(Y)} 
m=2,...,M 



1 (74) 



as n — > oo. This establishes the tightness of the ARPUC in Proposition 01 



C. Proof of 

Our proof of (1251 ) hinges on the theory of minimax robustness developed in [12]. Since this theory is 
also of a central technical role in Section |VJ to facilitate the reader, we briefly recapitulate notation and 
key results from [12] in Appendix IVII-EI In the context of Example [T] is the filter, $ E 5$ is the 
operating point, and 1/ (w? §tMh ) is the payoff function. We will show that the game has a saddle point, 
such that 

1 



R c = sup inf ... 

» w |i =1 $es* 2w{- $w l 



-l 
1 



mm sup — Fp 

=i 2w(<&w x 

' (75) 



2max$ eS± A min (£) 

First, by assumption, 5$ is convex, and for every the payoff function l/(wj $w {) is convex with 
respect to due to the convexity of function f(t) = 1/t. 

The least favorable operating point is easily shown from (1751 ) to be the covariance matrix <]?* G 5$ 
that maximizes A m j n ($), and the corresponding optimal filter w? is the associated unit-norm eigenvector. 

To complete the proof, we need to show that (j£*j5&*) is a regular pair. To this end, for every <£ £ 5$, 
consider the neighboring operating point $ a = (1 — a)$* + a<3? for small a E [0, 1]. On one hand, the 
payoff function maximized over {wy : \\wy\\ = 1} for a fixed operating point $ a is 1/A m i n ($ a ). By 
utilizing the matrix perturbation theory (e.g., [36]), this payoff value behaves like 

1 1 



Amin (JU (1 - a)A mi „(J>*) + a(w*) T §w{ + o(a) 
as a — > 0. On the other hand, the payoff function for is 

1 1 



(76) 



{wl) T !k a w* (1 - a)(u;*) T l*u;* + a(w*) T <Z>w* 

1 



(77) 



(l-^A^n^ + a^*)^*' 
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Comparing (l76l ) and (1771 ). we find that their difference scales like o(a) as a — > 0, if A m i n ($*) = 
su P$g5$ -^min(^) > c for some finite positive c > 0. Hence (w*,§i*) is a regular pair, and by the saddle 
point property we establish the validity of ( |25T ). 



D. Proof of 02]) 
From ( f3Qb we get 



Rc(x c ,l) = sup L{e(x 2 c + 1) -log E[exp(0Z 2 )]}. (78) 



For the mixed Gaussian noise, We can evaluate E[exp(#Z 2 )] as 

E[ex P (#Z 2 )] = [ e-V-o^dz+t [ e -(i/A-e)*> dz 



' ' +^^, (79) 



if and only if < 1/A < 1. Then d32l ) follows from direct manipulations. 

E. Summary of Notation and Key Results of Minimax Robustness 

The material of this subsection is from [12], and some changes in notation are adopted to meet the 
notational convention of the current paper. 

Denote by T and Q two linear topological spaces, called the space of filters and the space of operating 
points, respectively. The payoff, or utility, function U is a real functional 

U : T x Q -» M. (80) 

Suppose that F C T is the set of allowable filters and Q C Q is the set of possible operating points. 
Let us refer to the triple (F, Q, U) as a game, in which U is maximized over F and minimized over Q. 
We define a minimax robust filter / r as the filter that solves 

max inf U(f,q). 
f&Fg&Q KJ,HJ 

Its dual is a least favorable operating point q\ defined as the operating point that solves 

min sup [/(/, q). 

A saddle point solution to the game (F, Q, U) is (f\, q\) G F x Q that satisfies 

U(f,m)<U(f hqi )<U(f h q), (81) 
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for every (/, q) G F x Q. Note that if a game has a saddle point, then the corresponding filter is 
the robust minimax filter for the game, and furthermore the corresponding operating point is the least 
favorable operating point. 

A pair (f\, q\) G F x Q is called a regular pair if, for every q G Q such that q a := (1 — a)q\ + aq£ Q 
for a £ [0, 1], we have 

sup U(f, q a ) - U(f h q a ) = o(a), (82) 

where o(a)/a —* as a — > 0. 

The following theorem is the key result that we utilize in Section [Vj 
Theorem 2: [12, Theorem 2.1] Suppose that the game (F, Q, U) is such that 

(a) Q is a convex set, 

(b) U(f, •) is convex on Q for every f £ F. 

Then, if (/i, qi) is a regular pair for (F, Q, U), the following are equivalent: 

(1) q\ is a least favorable operating point for (F, Q, U), 

(2) (/i, <fl) is a saddle point solution for (F, Q, U). 

In contrast to typical minimax theorems (see, e.g., [34]), Theorem |2] lifts the (quasi-)concavity constraint 
of U(f, q) in /, and its validity in fact requires neither certain topological properties {e.g., compactness) of 
T and Q, nor (semi-)continuity properties of U(f, q) on F and Q. Indeed, the only essential requirement 
is the existence of the least favorable operating point. Therefore under certain circumstances, Theorem 
|2] appears quite convenient to utilize. 

A systematic procedure can be followed in order to apply Theorem [2] for a game (F, Q, U): 

1) Verify that the assumptions in Theorem |2] are satisfied. 

2) Find the least favorable operating point q\ and the corresponding optimal filter f*{q\) that solves 
swp fEF U(f, qi). 

3) Verify that the solved (f*(q\), q\) is a regular pair. The saddle point solution (f*(q\), q\) then yields 
the minimax robust filter f v = f*(qi). 

F. Proof of the equivalence between ( |55| ) and d56| ) 

We follow Appendix IVII-EI to prove the equivalence between (I55T ) and (l56l) . In the problem, g(-) is 
the filter, (P s ,{ , P s ,c) is the operating point, and 

U{g,(P.,f,P a ,c)) ■= sup{flEp. c [g(Y)] -logE P . f {exp[0 5 (Y)]}} (83) 

6»>0 
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is the payoff function. 

Let us first verify the required conditions for the game. By assumption, the set of allowable operating 
points, Vxt x V-^, is convex. We need to show that for every <?(•), the payoff function is convex on 
Px f x Px^- Consider two arbitrary (P Si f,P SjC ) and (P' si ,P' sc ) in P Xf x 7\,, and for every a G [0, 1] the 
convex combination (P S) f iQ , P s , c ,a) '■= \OiP s ,i + (1 — ot)P' s f , aP S]C + (1 — a)Ps C ). We have 



U(g,(P s>i , a ,P s ,c,a)) = sup{0E PscQ [ 5 (Y)] -logE P , fQ {exp[0 5 (Y)]}} 

e>o 

= sup \adE Ps o [ 5 (Y)] + (1 - a)#E Pj e [g(Y)] 

6»>0 L 

-log{aE Ps , {exp[^(Y)]} + (1 - a)Ep; f {exp[0 5 (Y)]}}} 

< sup \aOE P , c [ 5 (Y)] + (1 - a)#E Pj o [g(Y)] 
6»>0 1 

-QlogE Ps t {exp[9g(Y)}} - (I - a) logE P , ( {exp[^(Y)]}} 

< aC/ ( 5 , (P Sjf , P s , c )) + (1 - a)C7 ( 5 , (i* f , i* c )) , (84) 

where the first inequality follows from the concavity property of logarithmic functions, and the second 
inequality follows from the supremum operation. 

The next step is to find the least favorable operating point and the corresponding optimal filter. This 
is solved in Appendix IVII-GI The least favorable operating point is the distributions pair (P f *,P c *) that 
minimizes 

D (P SjC \\P s>{ ) , seS, 

and the corresponding optimal filter is 

^(y)=logJM and0* = l. 

Finally, we need to verify that (g*, (P f *, P*)) is a regular pair. Consider for every (Qf , Q c ) £ P Xf x P Xc 
the neighboring operating point 

(Pf,a, Pc,a) = ((1 - a)P f * + aQu (1 - a)P c * + "Qc) , 

for small a G [0, 1]. 

On one hand, by Appendix IVII-GI 



SUP U (g, (P t , a ,P c ,a)) = D (P c , a \\P{,a) = E p . 
(6,9) 



log 



(85) 
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Expanding the right hand side of (1851 ) at a = 0, we have 

^c,a(Y) 



d_ 

da 



d ^ 

da 



log 



a=0 



E, 



log 



fc,a(Y) 



L>(P*||P f *); 

^c,a(Y) 



log 



Pf,a00l 
Pc,a(Y) l 

^,«(Y)J a=0 



+E Pt 
En 



log 



Ep* 



log 



P C ,a(Y) 
^,a(Y) 



Q C (Y)-P C *(Y) Q f (Y)-P f *(Y) 



log 



P f *(Y) 



ff,a(Y) 
£>(P*||P f *) + l-E Q( 



J?(Y) 
P f *(Y) 



Hence the first-order expansion of (1851 ) is 



SUpC/ (Pf, a ,P Cl a)) 



D(p;\\pn+ e Qc 



log^ 



L>(P*||P f *) + l-E Qf 



i?00 

P*(Y) 



On the other hand, noting that for the optimal filter g* we have 6* = 1, hence 

-P*(Y)" 



U(g*,(P {ta ,P c , a )) 
P*(Y) 



E 



log- 



logEp 



P f *(Y) 

(l-a)7J(P c *||P f *) + aE Qc 
(l-a) J D(P c *||P f *) + aE Qc 



log 
log 



P f *(Y). 

W) 
P f *(Y) 

P£(Y) 

P f *(Y) 



log <j 1 - a + aE<g f 
P C *(Y) 



P C *(Y) 



E, 



P f *(Y) 



P f *(Y) 
l \ ■ a + o(a), 



(86) 



(87) 
(88) 



a + o(a). (89) 



(90) 



where we have used log(l + 1) = t + o(i) for |t| « 0. 

Comparing d89l and d90b , we find that all the terms, except o(a), cancel out. So that (g* , (P f *,P c *)) 
is a regular pair, and by the saddle point property we establish the validity of (l56l ). 



G. Proof of §57} and (El 

In order to maximize {9E Pa c \g(Y)] -logEp 3 f {exp[6>#(Y)]}} with respect to 6 and g(-), we let its 
variation regarding g(-) be zero, i.e., 



That is, 



^- (y) " E^iexp^Y)]} = °' f ° r ^ ^ * 



.'/< y > = Z log + \ log Ep., f {exp[^(Y)]}. 



^,f(y) 



(91) 



(92) 
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It can then be verified that a solution to d92l ) is 

s(y) = iog§44' and0 = 1 > (93) 

which then yields the maximum value D(P SjC ||P s f). The result is intuitively apparent, because the log- 
likelihood ratio is a sufficient statistic conditioned upon a given channel state realization. 
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